AITopics | group token

Supplementary Materials

Neural Information Processing SystemsFeb-17-2026, 18:02:47 GMT

During the inference, we set the background score as 0.95.

artificial intelligence, group token, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Neural Information Processing SystemsFeb-17-2026, 18:02:44 GMT

This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs. Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i.e., employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment. Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v.s.

artificial intelligence, machine learning, natural language, (13 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Israel (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(2 more...)

Genre: Research Report (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Neural Information Processing SystemsDec-27-2025, 02:31:49 GMT

This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs. Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i.e., employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment. Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v.s.

group token, uncovering prototypical knowledge, weakly open-vocabulary semantic segmentation, (5 more...)

Neural Information Processing Systems

Genre: Research Report (0.39)

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Supplementary Materials

Neural Information Processing SystemsOct-9-2025, 10:38:57 GMT

During the inference, we set the background score as 0.95.

artificial intelligence, group token, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

e95eb5206c867be843fbc14bbfe8c10e-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 10:38:54 GMT

artificial intelligence, machine learning, natural language, (13 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(3 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Neural Information Processing SystemsJan-20-2025, 01:29:12 GMT

This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs. Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i.e., employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment. Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v.s. We argue that this discrepancy arises from the lack of elaborate supervision for each group token. To bridge this granularity gap, this paper explores explicit supervision for the group tokens from the prototypical knowledge.

group token, uncovering prototypical knowledge, weakly open-vocabulary semantic segmentation, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence (0.41)

Add feedback

Perceptual Group Tokenizer: Building Perception with Iterative Grouping

Deng, Zhiwei, Chen, Ting, Li, Yang

arXiv.org Artificial IntelligenceJan-24-2024

Human visual recognition system shows astonishing capability of compressing visual information into a set of tokens containing rich representations without label supervision. One critical driving principle behind it is perceptual grouping (Palmer, 2002; Wagemans et al., 2012; Herzog, 2018). Despite being widely used in computer vision in the early 2010s, it remains a mystery whether perceptual grouping can be leveraged to derive a neural visual recognition backbone that generates as powerful representations. In this paper, we propose the Perceptual Group Tokenizer, a model that entirely relies on grouping operations to extract visual features and perform self-supervised representation learning, where a series of grouping operations are used to iteratively hypothesize the context for pixels or superpixels to refine feature representations. We show that the proposed model can achieve competitive performance compared to state-of-the-art vision architectures, and inherits desirable properties including adaptive computation without re-training, and interpretability. Specifically, Perceptual Group Tokenizer achieves 80.3% on ImageNet-1K self-supervised learning benchmark with linear probe evaluation, establishing a new milestone for this paradigm. Ever since the surge of deep learning, feature detection has predominated the vision field and become the main principle behind representation learning backbone designs and made impressive progress (Simonyan & Zisserman, 2014; Szegedy et al., 2015; He et al., 2016; Chen et al., 2017; Tan & Le, 2019; Qi et al., 2020; Liu et al., 2022b). The success of the former paradigm is, although striking, raising the question of whether perceptual grouping can also be used as the driving principle to construct a visual recognition model. Different from detecting and selecting distinctive features, perceptual grouping emphasizes on learning feature space where similarity of all pixels can be effectively measured (Uijlings et al., 2013; Arbeláez et al., 2014). With such a feature space, semantically meaningful objects and regions can be easily discovered with a simple grouping algorithm and used as a compact set to represent an image (Uijlings et al., 2013; Arbeláez et al., 2014; Locatello et al., 2020).

group token, opération, representation, (14 more...)

arXiv.org Artificial Intelligence

2311.18296

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Zhang, Fei, Zhou, Tianfei, Li, Boyang, He, Hao, Ma, Chaofan, Zhang, Tianjiao, Yao, Jiangchao, Zhang, Ya, Wang, Yanfeng

arXiv.org Artificial IntelligenceOct-29-2023

This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs. Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i.e., employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment. Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v.s. one-to-one manners during the training and inference phases, respectively. We argue that this discrepancy arises from the lack of elaborate supervision for each group token. To bridge this granularity gap, this paper explores explicit supervision for the group tokens from the prototypical knowledge. To this end, this paper proposes the non-learnable prototypical regularization (NPR) where non-learnable prototypes are estimated from source features to serve as supervision and enable contrastive matching of the group tokens. This regularization encourages the group tokens to segment objects with less redundancy and capture more comprehensive semantic regions, leading to increased compactness and richness. Based on NPR, we propose the prototypical guidance segmentation network (PGSeg) that incorporates multi-modal regularization by leveraging prototypical sources from both images and texts at different levels, progressively enhancing the segmentation capability with diverse prototypical patterns. Experimental results show that our proposed method achieves state-of-the-art performance on several benchmark datasets. The source code is available at https://github.com/Ferenas/PGSeg.

group token, prototype, semantic segmentation, (10 more...)

arXiv.org Artificial Intelligence

2310.19001

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Semantic-aware Message Broadcasting for Efficient Unsupervised Domain Adaptation

Li, Xin, Lan, Cuiling, Wei, Guoqiang, Chen, Zhibo

arXiv.org Artificial IntelligenceDec-5-2022

Vision transformer has demonstrated great potential in abundant vision tasks. However, it also inevitably suffers from poor generalization capability when the distribution shift occurs in testing (i.e., out-of-distribution data). To mitigate this issue, we propose a novel method, Semantic-aware Message Broadcasting (SAMB), which enables more informative and flexible feature alignment for unsupervised domain adaptation (UDA). Particularly, we study the attention module in the vision transformer and notice that the alignment space using one global class token lacks enough flexibility, where it interacts information with all image tokens in the same manner but ignores the rich semantics of different regions. In this paper, we aim to improve the richness of the alignment features by enabling semantic-aware adaptive message broadcasting. Particularly, we introduce a group of learned group tokens as nodes to aggregate the global information from all image tokens, but encourage different group tokens to adaptively focus on the message broadcasting to different semantic regions. In this way, our message broadcasting encourages the group tokens to learn more informative and diverse information for effective domain alignment. Moreover, we systematically study the effects of adversarial-based feature alignment (ADA) and pseudo-label based self-training (PST) on UDA. We find that one simple two-stage training strategy with the cooperation of ADA and PST can further improve the adaptation capability of the vision transformer. Extensive experiments on DomainNet, OfficeHome, and VisDA-2017 demonstrate the effectiveness of our methods for UDA.

artificial intelligence, domain adaptation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2212.02739

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report (0.70)

Industry: Health & Medicine > Diagnostic Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Filters

Collaborating Authors

group token

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Supplementary Materials

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Supplementary Materials

e95eb5206c867be843fbc14bbfe8c10e-Paper-Conference.pdf

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Perceptual Group Tokenizer: Building Perception with Iterative Grouping

Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation

Semantic-aware Message Broadcasting for Efficient Unsupervised Domain Adaptation